Probabilistic topic decomposition of an eighteenth-century American newspaper
نویسندگان
چکیده
vector space model for text data (Salton & McGill, 1983). In this model, each document in a corpus is represented by a term-frequency vector whose elements are the number of occurrences of each word in the vocabulary. Collectively, the set of these term-frequency vectors forms the document– word matrix representation of the corpus. All the methods we consider have this document–word matrix representation as the starting point. The classic information retrieval method, tf-idf (term-frequency inverse-document-frequency), is used in many search engines today. Despite tf-idf’s popularity, it does not handle synonymy and polysemy. Deerwester, Dumais, Furnas, Landauer, and Harshman (1990) devised Latent Semantic Analysis (LSA) to address this deficiency. Their method for detecting relevant documents based on words in queries improved upon simple word matching. Their association of words with documents (what they called semantic structure) moves us closer to the notion of topics. For example, LSA allows one to compute whether two documents are topically similar, even if the two documents do not have any words in common. There has been a huge increase in the number of historical primary sources available online.1 Yet there has been little work done on processing, modeling, or analyzing these recently-available corpora. Previous studies of historic document collections were limited by the number of items a researcher could analyze in a reasonable amount of time. For instance, Clark and Wetherell (1989) analyzed the Pennsylvania Gazette by sampling less than 10% of the total number of articles in just a 33 year period. Other authors analyzed a single category of a newspaper’s content, such as
منابع مشابه
Reporting Crime in the North of England eighteenth-century Newspaper : a Preliminary Investigation1
متن کامل
Newspaper reporting and attitudes to crime and justice in late eighteenth and early nineteeth century London
As other sources of printed information about crime, such as the Ordinary’s Accounts of the lives of executed criminals, lost their audience in the final third of the eighteenth century, newspapers came increasingly to dominate printed discussions of crime. However, no substantial study of the overall nature of newspaper reporting on crime and criminal justice issues has yet been undertaken. By...
متن کاملLuxury in the Eighteenth Century: Debates, Desires and Delectable Goods
Luxury in the Eighteenth Century is a welcome collection of essays on a very important topic. Since the 1982 appearance of the path-breaking The Birth of a Consumer Society, studies of consumption in eighteenth-century Western Europe have proliferated to confirm the thesis that the century experienced a dramatic surge in the production and consumption of goods.(1) This new, handsomely-edited vo...
متن کاملLuxury in the Eighteenth Century: Debates, Desires and Delectable Goods
Luxury in the Eighteenth Century is a welcome collection of essays on a very important topic. Since the 1982 appearance of the path-breaking The Birth of a Consumer Society, studies of consumption in eighteenth-century Western Europe have proliferated to confirm the thesis that the century experienced a dramatic surge in the production and consumption of goods.(1) This new, handsomely-edited vo...
متن کاملNews from the Hesburgh Libraries of Notre Dame
T he recent acquisition of a major microfilm collection titled “Early English Newspapers” has added 1,412 newspapers and broadsides to the Hesburgh Libraries’ holdings. This extraordinary purchase was made possible by the discernment and generosity of a group of University benefactors known as “The President’s Circle.” Many faculty and students are unaware of the wealth of primary sources avail...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JASIST
دوره 57 شماره
صفحات -
تاریخ انتشار 2006